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SPECIFICATION 

Electronic Version 1.2.8 
Stylesheet Version 1.0 

Method and Apparatus to Manage Multi- 
Computer Demand 

Cross Reference to Related Applicatioris 

The present appllcatron is related in subject matter to the following commonly- 
assigned pending applications: BALANCING CRAPHiCAL SHAPE DATA FOR PARALLEL 
APPLICATIONS, Serial No. 09/631 ,764 filed August 3, 2000 and METHOD AlsJD 
APPARATUS TO MANAGE MULTI- COMPUTER SUPPLY, Serial No. 09/943.824 filed 
August 31, 2001. 

Background of the Invention 

1000 1 ] Field of the Invention 

E0002] The present invention Is related to quantifying the multi-computer memory 
demand of a physical mathematics model, and more specifically to the multi- 
computer data processing of this model. More particularly present invention refers to 
managing the demand for computer memory caused by the formulation of large-scale 
scientific and engineering problems that are solved on multi-computers. 

[0003] Description of the Related Art 

[0004] A supply schedule shows the amount of a commodity that a producer is willing 
and able to supply over a period of time at various prices. A graph of the supply 
schedule is called the supply curve. The supply curve usually slopes upward from left 
to right because a higher price must be paid to a producer to induce the production of 
a commodity due to overhead costs. So the price of a commodity is directly 
proportional to its supply, and is called the law of supply. 

[00051 

A demand schedule shows the amount of a commodity that a consumer is willing 
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and able to demand over a period of time at various prices. A graph of the demand 
scheduie is cafled the demand curve. The demand curve usually slopes downward 
from left to right because a lower price induces the consumer to purchase more of a 
commodity. This Is to say that the price of a commodity is inversely proportional to its 
demand, and is called the law of demand. The equilibrium price and equilibrium 
quantity of a commodity are determined by its supply and demand. An equiObrium 
schedule shows the intersection of the supply curve and the demand curve. A graph of 
the equilibrium schedule is called the equilibrium curve and is called the law of 
equilibrium. 

10006] Multi-computer processing, hereafter called multi-processing, involves the use of 
multiple computers to process a single computational program that could be 
numerical, alphabetical, or both. Multi-processing is distinguished from uni- 
processing where a single computer is used to process an application program, and 
refers to the use of a parallel or a distributed computer. The programming of multi- 
computer is referred to as parallel programming. 

[0007] One method of multi- processing is by the use of the Message Passing Interface 
(MPI), which is a communication library for multi-computer communication. The 
defining feature of the message passing model is that the transfer of data from the 
memory of one or more computers to the local memory of another one or more 
computers requires operations to be performed by all of the computers Involved. Two 
versions of MPI software for UNIX-like operating systems are: the IBM Parallel 
Operating Environment, or POE, which Is a licensed IBM product, and Argonne 
National Uboratory's Implementation of MPI, or MPiCH. which is publicly available. 

[0008] The advantages and motivations for using multl -computing are, at least, three 
fold. First, the potentially enormous demand for computer resources made by a 
computational task is divided among multiple computers; second, the time required to 
complete a variety of applications is scaled-down by the number of processors used, 
and third, the reliability of completing such applications Is Increased because of the 
shorter processing time. The first of these reasons Is the most Important, since 
without sufficient resources, no time-scaling is possible. 

Brief Summary of the Invention 
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100093 This invention teaches a method, media and apparatus to tabulate a description of 
memory demand, then subsequently, based on the supply of multi-computer hosts, 
produce a iist of one or more computers to satisfy the demand along with data- 
segments of the modei. The properties of physical systems can be quantified by one 
or more partial differential equations phrased in one of several discretized numerical 
forms. The present invention addresses the problem of meeting the demand for 
memory made by the formulation of such a discretized system model, and teaches a 
method and apparatus to do so. 

[001 0] The invention calculates the density of the data needed to represent a system of 
interest, where often this data represents one or more electrical, mechanical, thermal, 
or optical properties. More particularly this Invention determines the memory needs of 
processors in a parallel processing system by inputting a discretized model for an 
application and initializing a computational domain; calculating a data density for 
each control element; cakufating demand cost for each sub-domain; minimizing the 
difference In average demand cost; ranking the processors by value; and generating a 
data ownership table and frame frie. 

Brief Description of the Several Views of the Drawings 

[00 n] Figure 1 shows an IC model as an example of a parallel application. 

fOOT 2] Figure 2 shows the cross-section of a chip with various metal levels. 

[001 3} Figure 3 shows the data flow of an embodiment of this Invention. 

[001 41 Figure 4 is a control element, which is the result of discretization of a problem. 
Data density Is computed for each control element. 

[001 51 Figure 5 shows the "perspective" of one-, two-, and three-space data density. 
Each dot pictured is a control element. 

Figure 6 shows two discretized systems represented by scalar and vector fields. 

Figure 7 shows the control elements, the density function and a graph of the 
Demand Cost for a one dimensional system. 

Figure 8 shows the control elements, the density function and a graph of the 



[001 6] 
[0017] 

[0018] 
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Demand Cost for a two dimensional system. 

Figure 9 shows the control elements, the density function and a graph of the 
Demand Cost for a three dimensional system. 

Figure 10 shows a computational domain before and after optimization of the 
average Demand Cost overall sub-domains. 

Figure 1 1 shows a graph of the density funaion and Demand Cost for a sub- 
domarn. 

I002Z] Figure 1 2 Illustrates a typical computer which could be used to practice this 
Invention. 

[0023] Figure 1 3 illustrates software media, a form of which can be used to store the 
methodology to practice this invention. 

Detailed Description of the Invention 

10024] A computer system is described by at least four properties: central processing unit 
(CPU), main memory, temporary file space, and cache memory page space. The 
property of CPU is drmensionless and therefore unit-less. CPU Is simply a count of the 
number of CPUs. The remaining three properties, main memory, temporary file space, 
and cache memory page space all have the dimension of data and the units of byte. 
For example, a computer may have the resources of four CPUs, 256 mega-bytes of 
main memory. 500 mega-bytes of temporary file space, and 125 mega-bytes of cache 
memory page space. Furtliermore. this system may be comprised of several such 
computers, thereby being a multi- computer. 

[0025] Integrated circuit (IQ. such as a microprocessor, can be described in a physical 

sense as a volumous blocic, with its electrical signals connected to the 'top" of the 
block, transferred "down" and through the blocl< to the transistors that occupy a plane 
and perform the funrtfon of the IC and rest upon a "lower" substrate, and 
subsequently returned to the top of the block through a similar series of wiring 
planes. See figures l and 2. The metal and via levels on figure 2 are represented by 
the resistive supply grid and the resistive ground grid of figure 1 . The wiring planes 
are shared by the three signal types of an iC; power signals (supply voltages and 



[001 9] 
[0020] 
[0021] 
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ground voltages), clock signals, and data/control signals. These signals pass from the 
electrical contaCTs down to the transistors, then back to the electrical contacts, by 
means of a series of superimposed and electrically insulated conductors. These planes 
are a laminate of "grids" and intervening "vias," ail with electrical resistance, 
capacitance and inductance. 

[0026] The physical structure of an IC can be expressed in a graphics language such as 

CL/1 that is recorded in a file and can Involve massive amounts of data. A method and 
apparatus for managing this data is described in patent application Balancing 
Graphical Shape Data For Parallel Applications, Serial No. 09/631 ,764 filed August 3, 
2000, called Parallel Chip Enable, or PARCE shown as 30 in Figure 3. In that method 
the physical struaure of an IC is geographically decomposed for subsequent parallel 
applications. PARCE is a scalable parallel program. A metric, called data density, is 
computed for the GL/T data, Data density is instantiated In a matrix that summarizes 
the bytes required to represent physical structures on the IC. Based on this matrix, the 
input CL/1 file is decomposed into several smaller files called frames. The 
decomposition respects the IC hierarchy, the geographic placement of IC cells, or 
building blocks, as well as the balance in terms of a homogeneous network. 

[0027] The apparatus for present Invention and that for the above Invention are simular 
in that a data density table (DDT) shown emanating from PARCE as 31 and the data 
ownership table (DOT) shown emanating from the demand block 340 as 32 are used 
In this invention. Those functions contained within block 30 are integrated into this 
invention through block 340. 

[0028] An overview of the system environment of invention can be seen in the left side of 
Figure 1 , wherein is shown the file (named "Servers") 34 of all resources available In 
the network for executing a parallel application 35. A server is any data processor 
attached to a computer network. A server can be a printer, a personal computer, a fax 
machine, a UNIX workstetion 33, etc. A host is a UNIX workstation that will accept 
commands from the network. Module 33 (called "Economy ") generates an output file 
36 (called 'Pool") which lists the resources selected by Economy for executing the 
parallel application 35. 

[0029) output of the present invention are the data density table 31 , the pool file 36, 
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frame files 37 and ownership table 32 which are a reflection of computed memory 
demand for the application. 

[0030] Having a physical system, an IC for example, quantification can begin when the 
laws governing the system are expressed in a mathematical form, usually as one or 
more partial differential equations (PDE). Such PDEs appear in electromagnetics, 
thermodynamics, fluid dynamics, and heat transfer, to name a few appJications. 
Several techniques can solve partial differential equations including both exact 
analysis and approximate numerical methods. The type of PDE, the number of its 
dimensions, the type of coordinate system used, whether the governing equations are 
linear or nonlinear, and if the problem is steady-state or transient determine the 
solution technique. The numerical method of discretization Is a powerful method for 
solving PDEs. Discretization Is a family of numerical methods whereby the continuous 
results contained in the exact solution Is replaced with discrete values. The 
computational domain, represented by a grid is defined over the physical domain of 
the system under analysis. 

1003 1 ] Usually the physical system under analysis is Irregular In shape, which results in 
an Irregular grid in the physical domain. The usual approach Is to map the physical 
domain Into a regular computational domain such as squares or rectangles by means 
of a transformation. This mapping is shown in figure 4. This practice lends Itself to 
parallel processing, since the computational domain can be partitioned along these 
reguUr lines (herein called these bisectors (See figure 10)) and the resulting sub- 
domains directed to multiple computers. 

[0032] Thus the calculation domain is divided into a number of no n- overlapping control 
elements such that there is one control element 40 surrounding each grid point n of 
finite element grid 41 . So a discretized linear-space looks like a 1 x n or m x 1 line of 
control elements, shown as 50 In figure 5A; a discretized area-space looks like an m x 
n plane of control elements, shown as 51 In figure SB; and a discretized volume-space 
looks like a m x n x p cube of control elements, shown as 52 in figure 5C. Within each 
control element, or grid point, one or more PDEs are defined, using combinations of 
the above mentioned dimensions. 



[00331 



The primary goal of the present invention is that it quantifies the amount of 
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computer memory storage needed to express the discretized form of a system. The 
present mventfon computes the amount of memory needed to formulate a system to 
solve the discretized problem. An amount of memory storage is referred to as 
demand, which is huge for many scientific and engineering problems. Since the 
discretized grid varies across its problem space, so does its demand. The present 
invention allows for this demand to be distributed to multiple computers, and so 
solves the problem of poor computational performance, and its possible failure, when 
the resource demand exceeds the computational resource supply. An advantage of the 
present Invention is that it quantifies this demand as a function of space to guarantee 
that this demand does not exceed its supply. 

[0034] An understanding of the present invention begins with the notion of data. The 
data associated with a system is related to the change in the media of a system with 
respect to space and time. Thus a system that demonstrates change has more data 
than a system of the same space that demonstrate less change. 

[0035] Next is the notion of a data point in a system. To illustrate a point, consider figure 
5. The perspective is as if one were "loolcing down" at the space. Each square or small 
cube represents a control volume surrounding a node point, or control element. These 
node points are a result of a numerical discretization. 

[0036] This Introduces the concept of bisectors. Bisectors are defined as mathematical 
boundaries within the control space and are perpendicular to the axis or axes of the 
space. For example, as illustrated in figure 5. the bisectors of the x ^ axis cut across 
the X ^ axis. Both horizontel bisectors and vertical bisectors are possible. 



[003 7J 



Consider a physical space with axis x ^ and x ^ as shown in figure 10. x ^ is 
partitioned every n integer length units, and x ^ every m integer length units. This m 
X n area is referred to as the Initial computational domain, or simply the domain. The 
m x n grid marks the sub-domains 1 01 . The domain are further partitioned into i 
rows and j columns. Now the sub-domains are consistently partitioned and are 
separated by horizontal bisectors and vertical bisectors. Horizontal bisectors run east 
and west across the domain, while vertical bisectors run north and south, dividing the 
domain into sub-domains of size i x j. This provides an m x r» area — the domain — 
divided into i xj boxes — ^the control elements of the sub-domains JOl. Note that 
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each sub-domain contains many control volumes; each control volume contains a 
node point. These are depicted as dots 100 In figure 10. 

[0038] Data density is defined by this invention as 

[0039] data density = data/point x point/space = data/space. 

10040] The data at any of these points is a simple number, a vector, a matrix, or any sort 
of data structure suitable to represent the discretized model. (Note, this invention is 
not concerned with what the actual value of the discretization, only with its storage 
requirement.) Graph 60 of figure 6 illustrates this Idea, where the system is shown 
with a uniform grid, and the discretization is a scalar Field, so that the data/point is a 
scalar such as a floating-point number. Computers generally offer several types of 
data storage to suit the nature and numerical precision requirements of the 
application. These include character, integer, floating-point, and double precision. 
Each has an associated amount of memory storage. Graph 61 of figure 6 is the same 
system, but Illustrating a "vector field." The vector field will have a greater data 
density than the scalar field. Thus the factor data/point is independent of the size of 
the system. The factor point/space, on the other hand, depends on the size of the 
system. Note that while figure 5 shows a uniform data density, it need not be, in 
which case the spacing of the points would vary accordingly. Figures 7, 8, and 9 
illustrate some cases where the data density is non-uniform. 

[0041] Helpful in understanding this invention is the description of the demand function. 
The demand function is that fraction of node points actually used in a given 
subdivision over the maximum number of node points possible if a fine grid is used 
unlformally throughout the space. That is 

[0042] demand(space) = refine(space)/fine(space) 

[0043] where refine is the non-uniform grid that Is based on the space rate of change of 
data, and fine is the uniform grid based on the discretization described above. Thus 

[0044] 0 ^demand ^ fine 1 

[00451 since when reflne=fine then refine/fine= I . Because of the n on -uniformity of 

refine with respect to space, the demand function will vary as shown In the graphs at 
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the bottom of figures 7. 8, and 9. Next, it helps define a unit-less ratio u, where 
[0046J cr = space/Period S so that 0 ^ <t ^ 1 . 
[0047] So 

I0048I demand(a) = refine(a)/fine. 

[0049] Also helpful to understanding this invention is a unit-less metric called 
Demand Cost, where 

[0050] 0 DemandCost ^1 

[0051 ] DemandCost is used in the invention to rank a problem in terms of its data 

density. DemandCost Is directly proportional to demand, and a problem's space-rate 
of change is directly proportional to computational density. That is 

[0052] d/da DemandCost r Denslty(o) 

[0053] which Is integrated to evaluate the demand cost of a sub-domain over a space- 
period S. Thus 

[00541 

DcRNndCMtdr) « f DeuKyC^Jdo. 

[005 S] The density function and demand cost for a sub-domain is illustrated in figure 1 1 . 

[0056] In order to satisfy the discrettzed demand, supply properties are examined and 

hosts are ranl(ed In terms described below and as described in IVIethod and Apparatus 
to IWanage Multi-Computer Supply, Serial No. 09/943,824, filed August 31 , 2001 . The 
assignment of hosts to satisfy sub-domain demand Is a mapping of highest sub- 
domain demand to highest hosts, while ensuring that the resources of the host 
selected can accommodate the sub-domain demand. In theory, this is the equilibrium 
point of the supply-demand model described by the preferred embodiment of this 
invention. The equilibrium point is the point at which the supply cost curve and the 
demand cost curve intersect, thus satisfying demand In an optimal manner. 
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[00571 Supply is defined as an "ordered-set" of scalars representing a ratio of the 
following properties of a computer: 

10058] Capacity = < CPU, memory, temporary disk space, cache > 

[0059] and 

[0060] Utilization = < CPU(t), memory(t), temporary disk spaced), cache(t) > 

[0061 ] Capacity is the pliyslcal configuration of a computer and is considered to be 

constant over long periods of time. It represents the availability of these properties. 
Utilization is variable and represents the use of these properties that vary with time. 
The properties have the corresponding dimensions and units as summarized in the 
following table. All units are greater than zero. 

[0062] 
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[0063] Suppiy(cj) is defined to be the normalized and, so, unit-less difference of Capacity 
and Utilization(a) 

[0064] Suppiy(cr} = (Capacity - Utilization(o)) /Capacity. 

[0065] The SupplyCost function is defined as 

[0066] 

8op|iiyQM((<i)= ^ Su|iply(o»ff. 

[0067] SupplyCost ranks a computer In terms of its properties over time where Supply 

t=>iCost. 

[0068] The following describes how these functions are utilized in the method and 
apparatus of this Invention. 

100691 

Referring to figure 3 the scientific or engineering model are input into PARCE 30 
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and the computional domains are iniatialized. The discretized model, which has been 
transformed into a regular grid, resides on computer disk storage as a file, and so 
requires its loading into memory. The format of this file is a fixed-length record, 
where each record describes a node, its coordinates, and the "value" of the 
discretization at that node, along with its neighbor nodes and their coordinates. This 
format makes the distribution of the file simple In a multf- processor Implementation 
of the present invention. The number of mult I -processors would be "User" defined, 
labeled P ^ . P , ■" n-1 " ^^"^ Processor reads an equal fraction of the total number 
of records In the file, where the number of file fractions equals the number of 
processors used. 

[0070] The initial computational domain Is divided into a number of equal sized 

geographic sub-domains with respect to the space coordinates of the model. This is 
illustrated in figure ?0, which, as an example, uses twelve processors labeled 0, 1, ... 
1 1 . This division is based on the number of processors selected by the user, and done 
by means of the bisectors, which are mathematical separations between groups of 
nodes. Figure 10 shows that the computational domain has two horizontal bisectors 
and two vertical biseaors. Each of the grid points, indicated as dots 100, is a control 
element. 

(0071] As each processor simultaneously reads its fraction of the file. If the node is 

"owned' — as defined by the sub-domain — by the processor that reads that node, 
then the node- name and its coordinates are loaded Into a data structure that 
represents that sub-domaln. A count of the total nodes loaded into that sub-domain 
is maintained as the points/space term that is needed to compute data density. If the 
node read Is owned by another processor, then this information is communicated 
using MPi to the processor that owns It. This process continues until each processor 
completes reading its fraction of the file. 

[0072] Each sub-domain Is itself divided into an integer fraction of rows and /or columns. 
Data density varies non-uniformity across each sub-domain because the overall 
domain has more data In some places than In others. For the case of linear data 
demand, the demand equation is of the form 

[0073] demand(linear) = (vectorii] /total number of grid points) ^ 0< J ^ n-1 
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[0074] where vectorp] is a vector that records the use of a grid point for a data point and 

i is the sub-domain number. 

[0075] The data density for each control element is then calculated in PARCE 30. Figure 7 
is an example of the use of this equation for the case of five grid points. On the top 
left-hand side of the figure is a one-space, labeled x ^ , with five nodes numbered 0 
through 4. Note that figure 7 is an example of the one-space data density shown in 
figure 5, and that a data-point occupies each "node* location on the axis. Since each 
of the five nodes is occupied by a data point, there are five fractions to evaluate. 

[00761 1/5.1/5, 1/5,1/5,1/5 

[00771 Note that the sum of the fine, or uniform, linear demand equals one. On the top 
right-hand side of figure 7 Is an example of the refined form of linear data density. 
Note that nodes 2 and 4 do not have a data point, so the corresponding values apply: 

[0078] 1/5.1/5.0.1/5.0 

[0079] These demand values are represented as a continuous graph at the bottom of the 
figure 7. Note that the curve dips to zero at the points on the curve that correspond to 
zero values on the refined linear space model. The DemandCost of this example is the 
area under the demand curve. Note that the cost of a fine demand Is one since the 
area under Its curve Is one. 

[0080] For the case of area data demand, the demand equation Is of the form 

[0081] demand(area) = (Z polntsIj][k]/ total number of grid points) | 0 ^ 1 ^ n-1 

[0082] where £ polnt5[j][k] is a matrix that records the use of a grid point for a data 

point. Figure 8 illustrates area demand, where now two axis are used: x ^ and x ^ . As 
before, the fine, or uniform, area data density has all nodes occupied with data points. 

10083] 5/25, 5/25. 5/25, 5/25, 5/2S 

[0084] Note that the sum of area fine demand equals one. At the top right-hand side of 
figure 8 is the refine area density where some grid points are unoccupied by data 
points. The corresponding refine data demand is 
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10085J 4/25, 2/2S, 5/25, 3/25, 3/25 

[0086] The value of each refined factor is the sum of the data points in the column of 
nodes above it. The bottom of the figure shows the density function and its cost. 

[0087] Finally, figure 9 shows a volume data density. As before, the top left-hand side is 
the fine, or uniform, data density 

[0088] 25/125,25/125. 25/125,25/125,25/125 

[0089] Note that the sum of the volume fine demand equals one. Again at the top right- 
hand side of figure 9 Is the refined volume data density with its corresponding values, 
where the value of each refined faaor is the sum of the data points in the 
corresponding plane of nodes. Thus linear, area, and volume data demand funaions 
represent the data density within sub-domain. 

[0090] After calculating the data density PARCE 30 is used to calculate the demand cost 
for each sub-doamain. The cost of each sub- domain Is the area under the density 
curve. This is illustrated in figure 1 1 . This area may be calculated by a numerical 
integration method such as the trapezoid rule or Simpson's rule. See, for example, 
P.A.Calter. M.A.Calter, Technical IWathematics, 4tb ed., John Wiley and Sons, 
International Standard Booli Number (ISBN) 0-471-36887-3, 2000, pg, 1068. Because 
of variations In data density, the cost also will vary across the model. Next, an average 
cost for ail sub-domains Is determined, which maybe a distributed reduce operation. 

[0091] PARCE 30 is then used to minimize the difference in the average demand cost. The 
object of minimizing the difference In average demand cost step is to adjust the "size" 
of the sub-domain to minimize the difference in the average cost of the sub-domains. 
The result is approximately equal data density among all sub-domains. 

[0092] 

Sub-domain size can be adjusted by "moving" the bisectors. The vertical bisectors 
are free to "move" left and right, and the horizontal bisectors are free to move up and 
down. Figure 1 0 illustrates this process. Figure 10 shows a horizontal computational 
domain 102 where the horizontal bisectors remain as continuous lines and the vertical 
bisectors break-up in order to form smaller sub-domains that have a cost closer to 
the average cost. Lilcewise, figure 10 shows a vertical computational domain 103 
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where the vertical bisectors remain as continuous lines and tlie horizontal bisectors 
break-up in order to form smaller sub-domains that have a cost closer to the average 
cost. 

[0093) The process of minimizing the difference In the average cost is a minimization 
problem, and solved by optimization methods. This Is done by adjusting the size of 
the sub-domains, without changing the number of sub-domains, such that the 
change in the average sub-domain data density is minimized, A constraint on this 
process is that in one approach, the horizontal transformation, the number of 
horizontal bisectors remains constant, although these bisectors can ntove up and 
down, and the vertical bisectors can be fragmented within their row. Alternately, In a 
second approach, the vertical transformation, the number of vertical bisectors remains 
constant, where these bisectors can move right and left, and the horizontal bisectors 
can be fragmented within their column. Data density is recomputed based on the grid 
In each changed sub-domain. 

[0094] The hosts are then ranked by value. Once the demand cost has been optimized, 
the corresponding data density table (DDT) 31 in figure 3 Is transported to the 
Economy utility 33 as a file, which Is "Written" by one of the multiprocessors after 
having collected the data density from each of the multi-processors. 

[0095] The Economy utility 33 uses the data density table (DDT) to assign specific 

processors to each of the sub-domains by mapping the supply ranking to the demand 
ranking. Economy 33 returns this assignment in the form of the data ownership table 
(DOT) 31 as a file. This is shown in figure 3. The "ownership" assignment is used by 
each of the processors to generate the appropriate sub-domain file or frame. 

[0096] The generating of data frames occurs in PARCE 30, Each of the processors 

simultaneously writes its frame file. The frame file 37 is that pan of the original input 
file (domain), comprising the sub-domain selected by the optimization method 
proposed in this invention. These frames are used by the processors selected by the 
Economy utility to run parallel applications as shown In figure 3. 



[0097] 



Thus the demand of a physical problem Is quantified in term of memory (31). It is 
the demand that is matched against the quantified supply of computer resources In 
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demand 340 to enable a parallel solution of the problem using the data ownership 
table 32 and the sub~domain frames 37. 

10098] Note that In the preferred embodiment of this invention the methodology itself is 
a "parallel program" which is embodied In media. In addition to the hardware/ software 
environment described in figure 3 above, such a method may be implemented, for 
example, by operating a computer, as embodied by a digital data processing 
apparatus, to execute a sequence of machine-readable Instructions. These 
instructions may reside in various types of signal-bearing media. Returning briefly to 
figure 3 showing a simplified data-flow of the Economy program and its relation to 
the PARCE program, the software module Economy is itself a parallel program 
operating under the IBM Parallel Operating Environment (POE) or the Argonne National 
Laboratory MPICH. Also shown are three files. Servers 34. Hosts 38, and Pool 36. The 
Servers file 34 lists a!) the processors available within a computer network, including 
computers, worlcstations, printers, fax machines, etc., including the "address" of each 
server. The Servers file is processed to produce the Hosts file 38. This is done by one 
or more computers testing the servers by means of a series of Operating System 
commands, as Is well known in the art (see, for example, IBM Publication SC23-41 1 5- 
00 IBM AIX Command iJinguage Reference, IBM Corporation, 1 997). 

[0099] Figure 1 2 illustrates a typical hardware configuration of an information 

handling/computer system in accordance with the invention and which preferably has 
at least one processor or central processing unit (CPU) 10! 1 . 

10 1 00] 

The CPUs 1 01 1 are Interconnected via a system bus 1 01 2 to a random access 
memory (RAM) 1014, read-only memory (ROi\fl) 1016, Input/output (I/O) adapter 1018 
(for connecting peripheral devices such as disk units 1 021 and tape drives 1 040 to 
the bus 1012), user interface adapter 1022 (for connecting a keyboard 1024, mouse 
1 026, speaker 1028, microphone 1032, and/or other user interface device to the bus 
1 01 2), a communication adapter 1 034 for connecting an information handling system 
to a data processing network, the Internet, an intranet, a personal area network (PAN), 
etc., and a display adapter 1036 for connecting the bus 1012 to a display device 1038 
and/or printer 1039 (e.g.. a digital printer or the like). Thus, this aspect of the present 
invention is directed to a programmed product, comprising signal -bearing media 
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tangibly embodying a program of machine-readable Instructions executable by a 
digital data processor incorporating tlie CPU lOn and hardware above, to perform 
the metliod of the invention. 

[OtOt] This signal-bearing media may include, for example, a RAM contained within the 
CPU 1 on , as represented by the fast-access storage for example. Alternatively, the 
Instructions may be contained in another signal-bearing media, such as a magnetic 
data storage diskette 1 1 00 figure 1 3, directly or indirealy accessible by the CPU 
1011. 

[01 02] Wiiether contained in the diskette 1 1 00 (the disicette is representative of media 
only and not necessarily preferred), the computer/ CPU 1 01 1 , or elsewhere, the 
instrualons may be stored on a variety of machine-readable data storage media, such 
as DASD storage (e.g., a conventional "hard drive" or a RAID array), magnetic tape, 
eiectronic read-only memory (e.g.. ROM, EPROM, or EEPROM), an optical storage 
device (e.g. CD-ROM, WORI^, DVD, digital optical tape, etc.), paper "punch" cards, or 
other suitable signal-bearing media including transmission media such as digitai and 
analog and communication iinks and wireless. In an Illustrative embodiment of the 
invention, the machine-readable instructions may comprise software object code, 
compiled from a language such as "C", etc. 

[01 03] While the invention has been described in terms of a single preferred 

embodiment, those si<iiled In the art wiH recognize that the invention can be practiced 
with modification within the spirit and scope of the appended claims. 
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What is Claimed is: 

[cl ] 1 • A method to determme the memory requirements of an application running 

in parallel on the system, comprising the steps of: 

inputting a model and Initializing a computational domain; 
calculating a data density for each control element; 
calculating demand cost for each sub-domain; 
minimizing the difference In average demand cost; 
ranking the processors by value; and 
generating a data ownership table and frame file. 

[c2| 2. The method of claim 1 , wherein the model is a discretlzed system model of a 

physical system. 

[c3] 3. The method of claim 1 , wherein tnittaiizing a computational domain also 

comprises dividing the domain into a number of equal sized geographic sub- 
domains with respect to the space coordinates of the model. 

[c4} 4. The method of claim 3, wherein initializing a computational domain also 

comprises dividing the sub-domains into an integer fraction of rows and /or 
columns. 

[c5] 5. The method of claim 1 wherein data density within the sub-domains is 

represented by linear, area, and volume data demand functions. 

[c6] 6. The method of claim 1 wherein the demand cost is an area under a data 

density curve and is calculated by a numerical Integration method. 

tc7] 7. The method of claim 1 wherein minimizing the difference in average demand 

cost also comprises adjusting the sub-domain size by "moving " bisectors. 

[c8] 8. The method of claim 7 wherein minimizing the difference in average demand 

cost also comprises recomputing data density based on a grid in each size 
adjusted sub-domain. 



tc9] 



9. A system of networked computers having a plurality of processors and an 
operating system for executing a target parallel application process using at 
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least a subset of said plurality of processors, wherein said system includes a 
method to determfne the memory requirements of an application running in 
parallel on the system, said method comprising: 

Inputting a model and initializing a computational domain; 

calculating a data density for each control element; 

calculating demand cost for each sub-domain; 

minimizing the difference In average demand cost; 

ranking the processors by value; and 

generating a data ownership table and frame file. 

[cl 0] 1 0. The method of claim 9. wherein the model Is a discretlzed system model of 

a physical system. 

[ell] 11. The method of claim 9, wherein Initializing a computational domain also 

comprises dividing the domain into a number of equal sized geographic sub- 
domains with respect to the space coordinates of the model. 

[cl 2) 12. The method of claim 1 1 , wherein Initializing a computational domain also 

comprises dividing the sub-domains into an Integer fraction of rows and /or 
columns. 

[cl 3] 1 3. The method of Claim 9 wherein data density within the sub-domains Is 

represented by linear, area, and volume data demand functions. 

[cl 4] 1 4. The method of claim 9 wherein the demand cost is an area under a data 

density curve and is calculated by a numerical integration method. 

[cl 5] 15. The method of claim 9 wherein minimizing the diiference In average 

demand cost also comprises adjusting the sub-domain size by "moving" 
bisectors. 

[cl 6] 1 6. Til e method of claim 1 5 wherein minimizing the difference tn average 

demand cost also comprises recomputing data density based on a grid in each 
size adjusted sub-domain. 

[cl 7] 1 7. A signal-bearing medium tangibly embodying a program of machine- 

readable instructions executable by a digital processing apparatus to determine 
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the memory requirements of an application running in parallel on the system, 
said machine-readable instructions comprising; 

inputting a model and Initializing a computational domain; 

calculating a data density for each control element; 

calculating demand cost for each sub- domain; 

minimizing the difference in average demand cost; 

ranking the processors by value; and 

generating a data ownership table and frame file. 

[cl 8) 1 8. The method of claim 1 7, wherein the model Is a discretized system model 

of a physical system. 

[c1 9] 1 9. The method of claim 1 7, wherein initializing a computational domain also 

comprises dividing the domain into a number of equal sized geographic sub- 
domains with respect to the space coordinates of the model. 

[c20] 20. The method of claim 1 9, wherein Initializing a computational domain also 

comprises dividing the sub-domains Into an integer fraction of rows and/or 
columns. 

[c2 1 1 21 . The method of claim 1 7 wherein data density within the sub-domains is 

represented by linear, area, and volume data demand functions. 

[c22] 22. The method of claim 1 7 wherein the demand cost is an area under a data 

density curve and is calculated by a numerical integration method. 

[c231 23. The method of claim 1 7 wherein minimizing the difference in average 

demand cost also comprises adjusting the sub-domain size by "moving" 
bisectors. 

[c24] 24. The method of claim 24 wherein minimizing the difference In average 

demand cost also comprises recomputing data density based on a grid in each 
size adjusted sub- domain. 
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Method and Apparatus to Manage Multi- 
Computer Demand 

Abstract of the Disclosure 

A method, multi- computer media and apparatus that uses an economics model to 
manage the demand and the resource satisfying that demand for m jiti-computer 
memory- The present Invention quantifies demand as a function of space, and 
computer resource as a function of time, so that a computational system can meet an 
application demand. 
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